Holistic Approach for Classifying and Retrieving Personal Arabic Handwritten Documents
نویسنده
چکیده
This paper presents a novel holistic technique for classifying and retrieving Arabic handwritten text documents. The retrieval of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several features are extracted from these connected parts and then combined to represent a word with one consolidated feature vector. Finally, a generalized feedforward neural network is used to learn and classify the different styles/fonts into word classes, which are used to retrieve Arabic handwritten text documents. Key-Words: Data mining of Arabic text, Word recognition, Arabic handwriting, Segmentation of Arabic handwritten documents, Feature extraction, Classification, and Retrieval of Arabic handwritten documents.
منابع مشابه
Classification of Personal Arabic Handwritten Documents
This paper presents a novel holistic technique for classifying Arabic handwritten text documents. The classification of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several structural and statistical features are extracted from these connected ...
متن کاملNeural Network Based Segmentation Algorithm for Arabic Characters Recognition
This paper presents a novel holistic technique for classifying Arabic handwritten text documents, which it is performed in several steps. First, the Arabic handwritten document images are segmented into their connected parts. A simple heuristic segmentation algorithm is used which finds segmentation points in printed and cursive handwritten words. Second, several features are extracted from the...
متن کاملW-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents
This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon...
متن کاملMethods of the Arabic Manuscripts Digitization
1 The authors acknowledge Saint-Petersburg State University for a research grant 2.37.175.2014. Abstract The mediaeval Arabic manuscripts are not only valuable artifacts but they also represent one of the major sources of scholar information in the field of Oriental Studies. This paper discusses the methods of Arabic Manuscripts Digitization. Over the last fifteen years a lot of Arabic manuscri...
متن کاملSeparation of Overlapping and Touching Lines within Handwritten Arabic Documents
In this paper, we propose an approach for the separation of overlapping and touching lines within handwritten Arabic documents. Our approach is based on the morphology analysis of the terminal letters of Arabic words. Starting from 4 categories of possible endings, we use the angular variance to follow the connection and separate the endings. The proposed separation scheme has been evaluated on...
متن کامل